Chat Completions API Reference
The Chat Completions API is the core of our text generation service. Given a series of messages, the model will return a predicted response. This is ideal for building chatbots and other conversational AI applications.
Request Body
The request body must be a JSON object with the following parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | The ID of the model to use. |
messages | array | Yes | A list of message objects that form the conversation history. See the structure below. |
temperature | number | No | Controls randomness. A value between 0 and 2. Higher values like 0.8 make the output more random, while lower values like 0.2 make it more focused and deterministic. Defaults to 1. |
max_tokens | integer | No | The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length. |
stream | boolean | No | If true, the API will send back partial message deltas as they are generated, like in ChatGPT. This is useful for creating a real-time, responsive user experience. Defaults to false. |
top_p | number | No | An alternative to temperature sampling, called nucleus sampling. The model considers only the tokens with a cumulative probability of top_p. For example, 0.1 means only the top 10% of tokens are considered. Defaults to 1. |
n | integer | No | How many chat completion choices to generate for each input message. Note that this will multiply the number of API calls you make. Defaults to 1. |
stop | string or array | No | Up to 4 sequences where the API will stop generating further tokens. |
The messages Object
The messages array is the heart of the request, providing the conversation history that the model will use to generate a response. Each message object in the array has a role and content.
| Role | Description |
|---|---|
system | The system message helps set the behavior of the assistant. It can be used to provide high-level instructions for the conversation, like "You are a helpful assistant that translates English to French." |
user | A message from the user. This is where you provide the prompts and questions for the assistant. |
assistant | A message from the assistant. This can be used to provide examples of desired behavior (few-shot prompting) or to continue a conversation. |
A typical conversation starts with a system message, followed by an alternating series of user and assistant messages.
Response Body
The API returns a JSON object containing the completion choices.
| Parameter | Type | Description |
|---|---|---|
id | string | A unique identifier for the chat completion. |
object | string | The object type, which is always chat.completion. |
created | integer | The Unix timestamp (in seconds) of when the completion was created. |
model | string | The model used for the completion. |
choices | array | A list of chat completion choices. |
usage | object | An object containing token usage statistics for the completion. |
The choices object
| Parameter | Type | Description |
|---|---|---|
index | integer | The index of the choice in the list of choices. |
message | object | The message object generated by the model, containing role and content. |
finish_reason | string | The reason the model stopped generating tokens. Can be stop (if it reached a stop sequence), length (if it reached max_tokens), or content_filter. |
The usage object
| Parameter | Type | Description |
|---|---|---|
prompt_tokens | integer | The number of tokens in the prompt. |
completion_tokens | integer | The number of tokens in the generated completion. |
total_tokens | integer | The total number of tokens used in the request (prompt + completion). |
Example Response
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "meta-llama/Llama-3.3-70B-Instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "\n\nHello there, how may I assist you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 12,
"total_tokens": 21
}
}
Example Request
Here is an example of a request to the Chat Completions API using different programming languages.
- Python
- JavaScript
- cURL
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.inceptron.io/v1",
api_key=os.environ["INCEPTRON_API_KEY"],
)
completion = client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
)
print(completion.choices[0].message.content)
import { OpenAI } from "openai";
const client = new OpenAI({
baseURL: "https://api.inceptron.io/v1",
apiKey: process.env.INCEPTRON_API_KEY,
});
const chatCompletion = await client.chat.completions.create({
model: "meta-llama/Llama-3.3-70B-Instruct",
messages: [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"},
],
});
console.log(chatCompletion.choices[0].message.content);
curl https://api.inceptron.io/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $INCEPTRON_API_KEY" \
-d '{
"model": "meta-llama/Llama-3.3-70B-Instruct",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
]
}'
Response Body
The response body is a JSON object that contains the result of the chat completion.
Standard Response
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "meta-llama/Llama-3.3-70B-Instruct",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "
Paris is the capital of France."
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 12,
"total_tokens": 21
}
}
Streaming Response
When stream is set to true, the API will return a stream of data:-prefixed JSON objects. The final object will be [DONE].
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"content":"
"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"content":"Paris"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"content":" is"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"content":" the"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"content":" capital"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"content":" of"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"content":" France"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"content":"."},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]